Introduction Nucleophosmin1-mutated (NPM1m) acute myeloid leukemia (AML) accounts for ~30% of adult AML and is generally considered favorable risk in the absence of FLT3-ITD mutations (FLT3m); However, there remains substantial heterogeneity in outcomes as well as clinical features, molecular aberrations, and induction regimens. Current tools lack precision for prediction of individualized risk of early death (ED). Here we developed baseline-only machine learning (ML) models to predict ED and identify leading clinical and genomic risk drivers in contemporary therapy of NPM1m AML.

Methods We retrospectively reviewed adult NPM1m AML patients (pts) treated at Roswell Park Comprehensive Cancer Center from 2006–2025. Baseline variables included demographics, diagnostic laboratory values (white blood cells (WBC), platelet, peripheral and bone marrow blasts), cytogenetic risk, co-mutations, induction regimen, measurable residual disease (MRD) status, and allogeneic transplant (HSCT). Baseline pre-treatment features were included for prediction. ED was defined as death within 1 year (yr) of diagnosis. Univariable and multivariable logistic regression were performed to identify predictors of ED. Cox proportional-hazards models estimated hazard ratios for overall survival (OS). For each clinical endpoint (OS, ED), model selection was performed via nested, stratified cross-validation (CV), comparing L1-penalized logistic regression, gradient boosting, and XGBoost. The final model for each endpoint was chosen as the algorithm achieving the highest mean area under the receiver operating characteristic curve (AUROC) across outer CV folds. Model performance was further validated using sensitivity, specificity, and calibration metrics.

Results A total of 207 pts were evaluated. Median age was 66 years (range 19-96) and 54% were women; most (88.4%) were Caucasian. Median WBC was 27.5 × 10⁹/L, platelet 65 × 10⁹/L, marrow blasts 60%, and peripheral blasts 37%. 97% had intermediate risk cytogenetics and 20% were diagnosed as Secondary AML (sAML). Two-thirds (67%) received intensive chemotherapy induction. Most common co-mutations included FLT3-ITD (39%), DNMT3A (23%), TET2 (15%), IDH2 (14%), and IDH1 (14%); less common mutations (TP53, ASXL1, RUNX1, RAD21, PTPN11, KRAS, NRAS) each occurred in <10%. MRD by any method was performed after induction in 64% of pts, with 49% achieving MRD-negativity. 31% underwent alloHSCT, most often in first remission.

Median OS (mOS) for the entire cohort was 24.4 months (mos) (95% CI 15.8–33.8) with 1- and 5-yr OS of 63% and 35% respectively. Survival improved over time: 5-yr OS rose from 23% pre-2014, to 29% for 2014–2018, and to 45% 2018-25. Stratification by age revealed marked differences: pts <65 yrs (n=91) had a mOS of 52.3 mos, with 1-yr of 76% and 5-yr OS of 48%. In 116 older pts (≥65 yrs), mOS was 14.3 mos with 1-yr OS 53% and 5-yr OS of 26%, respectively. Pts achieving MRD negativity post-induction had a 5-yr OS of 60%, compared with 18% for MRD-positive pts.

ML models robustly predicted survival of NPM1m AML pts at multiple timepoints. The AUROC was 0.74 for 1-yr OS, 0.72 for 3-yr OS, 0.79 for 5-yr OS, 0.73 for complete remission, and 0.88 for ED. The top predictive features included age, performance status, WBC at diagnosis, induction intensity, and presence of co-mutations (DNMT3A and FLT3-ITD).

These data were confirmed in additional analyses. Univariate Cox regression analysis identified age (HR 1.05), performance status (ECOG≥ 2; HR 1.74), sAML (HR 2.67), cytogenetics (HR 2.65), ASXL1m (HR 4.25), TP53m (HR 3.68), co-mutated FLT3-ITD and DNMT3Am (HR 3.21), and MRD-positivity (HR 2.86) as predictors of adverse survival. Multivariate analysis found poor performance status (ECOG ≥ 2; HR 1.94), sAML (HR 3.1), and co-mutations of FLT3-ITD and DNMT3A as independently predictive of worse outcomes.

Conclusions ML models using baseline clinical and genomic features provided correct, individualized risk prediction for early mortality in NPM1m AML pts treated between 2006-25. Age, performance status, WBC, induction intensity, and DNMT3A/FLT3-ITD co-mutations were most prognostic, as independently validated in multivariable analysis. These findings suggest that ML-driven, risk stratification at diagnosis can successfully predict treatment outcomes in NPM1m AML with high accuracy. External validation in independent cohorts is ongoing.

This content is only available as a PDF.
Sign in via your Institution